Objective Bayesian statistics 



O.-A. Al-Hujaj and H.L. Harney 
MPI fur Kernphysik, Postfach 103980, 69029 Heidelberg, Germany 

(February 2, 2008) 



Abstract 

Bayesian inference — although becoming popular in physics and chemistry — 
is hampered up to now by the vagueness of its notion of prior probability. 
Some of its supporters argue that this vagueness is the unavoidable conse- 
quence of the subjectivity of judgements — even scientific ones. We argue 
that priors can be defined uniquely if the statistical model at hand posses 
a symmetry and if the ensuing confidence intervals are subjected to a fre- 
quentist criterion. Moreover, it is shown via an example taken from recent 
experimental nuclear physics, that this procedure can be extended to models 
with broken symmetry. 
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INTRODUCTION 



Bayesian inference is becoming popular in the physical sciences. It finds eloquent and 
noteworthy defenders [00, it is treated in a growing number of textbooks [0-0], an annual 
series of conferences || as well as numerous articles report on its applications. Even the 
"Guide to the expression of uncertainty in measurement" |J supported by the International 
Organization for Standardization (ISO), the Bureau International des Poids et Mesures 
(BIPM) and other international organizations implicitly favors Bayesian statistics - - as 
D'Agostini points out (in Sees. 2.2 and 14.2 of 0). Some authors that have collaborated in 
the formulation of the Guide |J and of the German standard jnj , adhere in their publications 
to "a Bayesian theory of measurement uncertainty" ||11||. The change of paradigm from 



frequentist to Bayesian statistics (for this antagonism see e.g. the introduction of [11] and 
ch. 7 of ]l2j| ) is taking place despite the fact, that the disturbing vagueness of Bayesian prior 
probabilities persists up to now. D'Agostini eloquently holds that the prior probabilities 
are the mathematical representation of an unavoidable subjectivity of judgments — even 
of scientific ones. In our opinion, however, a criterion exists that prior probabilities should 
meet — at least approximately — and which in many cases does not leave any freedom 
in their choice. This criterion is a consequence of the fact that Bayesian inference has 
a frequentist interpretation - - as will be explained below. This fact has been pointed 
out earlier (by Welch, Peers, Stein and Villegas p3|-[T7|]) in the literature of mathematical 
statistics but it seems rarely known in practice. In the present note, the criterion is described 
as well as the circumstances under which it is met. They amount to the existence of a 
symmetry of the model that states the relation between event and hypothesis. The Bayesian 
prior is then a Haar measure of the symmetry group. By helps of a realistic example taken 
from current nuclear physics it is shown that this procedure can be extended to the usual 
case of models with broken symmetry. Finally the same example shows that the popular 
likelihood method — as described in ]nj — yields confidence intervals of lower reliability 
than the Bayesian procedure. 



BAYESIAN STATISTICS 

Bayesian statistics is a very useful tool for statistical inference. Let x denote the con- 
tinuous event and £ the continuous hypotheses. Suppose that the conditional probability 
p(x\^)dx — the above mentioned model — as well as the event x are given. The problem 
of statistical inference is: What can we learn from x about £? The Bayesian answer (19| 
is a conditional probability P for £ given x which can be expressed in terms of the given 
distribution p, 

Here, /z(£) is the prior distribution assigned to £ in the absence of or "prior to" any ex- 
perimental evidence. Hence, Bayesian statistics relies on the assumption that /i(£) is a 
meaningful object. 

The choice of the prior is the problem of Bayesian theory, because it is not clear what 
we can know "prior" about £. 
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SUBJECTIVE INTERPRETATION 



The view held e.g. in the recent article by D'Agostini 0] or the recent textbook by 
Howson and Urbach is that the choice of should be left to the good taste and the 
experience of the scientist analyzing the event x. They argue that a such a subjective element 
should be in any honest theory of inference since it makes explicit the subjectivity of any 
judgement — including the scientific ones: nothing can be known about £ in an objective 
way prior to the event. In subjective interpretation Bayesian probabilities reflect personal 
confidence in hypotheses. It can be expressed in bets (see ||). 

This view does in principle not preclude objectivity altogether: Objectivity exists as the 
limiting consequence of an infinite number of recorded events. Indeed one can show that in 
this limit Bayesian inference becomes independent of in the sense that then tends 

towards a (^-distribution with respect to £ — centered at the true value £ of the hypothesis. 



OBJECTIVE BAYESIAN STATISTICS 

The subjective interpretation — taken literally and without appealing to some common 
sense — allows anything. This is obvious from eq. ([[]): if one is free to choose /x(£), one can 
generate any P(£|x) — given a finite number of events. To avoid this, we prefer to look for 
some criterion that would severely restrict the class of allowed priors. Fortunately there is 
a very natural one. 

Consider a Bayesian confidence area A(x,C). It shall satisfy 



d£P{t\x) = C, (2) 

A(x,C) 

which is usually stated in the form: "given the event x, the hypothesis lies with confidence 
C in the area A\ We want to reformulate this in a way which turns the vague notion of 
confidence into probability and by the same token defines the desired criterion. 

Imagine an ensemble X of events x with relative frequencies p{x\^)dx. Let x run over the 
ensemble and suppose that from every x the confidence area A(x, C) is derived in a unique 
way, e.g. by determing the smallest one. This yields an ensemble of confidence areas. The 
criterion then is: The prior must lead to an ensemble of confidence areas such that they 
cover the true value £ with probability C. 

Since this gedanken experiment — which can even be realized via Monte Carlo simula- 
tion — equates C with a well defined frequency (to cover £), the criterion turns confidence 
into frequentist probability. In short: We require that Bayesian confidence areas are fre- 
quentist confidence areas. It is proven in the mathematical literature that this criterion 



can be met exactly: Let the conditional probability p(x\£)dx be invariant under a Lie group 
G represented as transformations of x and £, i.e. 

p{x\i)dx = p{G p x\G P i)dG p x, G p G G, (3) 

suppose furthermore that the definition of the confidence area A is invariant under G, i.e 

A(G p x,C) = GpA(x,C), (4) 
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then the right Haar measure of the symmetry group is a suitable prior in the sense of the 
criterion. It is necessary for this, that the hypothesis can be identified with the symmetry 
group of the conditional probability, i.e. for every two hypotheses £1,^2 there must be exactly 
one transformation G p G G such that £1 = G p £, 2 - Then the uniqueness of the right Haar 
measure implies the uniqueness of the prior and one has 



MO 
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(5) 



which is the inverse Jacobian of the transformation G p taken at the unit element of the 
group. 

Note that the very interesting case of A to be the smallest area of confidence C, satisfies 
eq. (^), if the volume of A is determined by help of the measure that is invariant under G, 
i.e. the left Haar measure. 

The prior distribution is not independent of the statistical model; rather it is defined 
by the structure of the model (see [pl| , pO[| ). It is prior to the observations. Since the above 
symmetry uniquely defines the prior and since it ensures a frequentist interpretation of 
Bayesian confidence intervals and since frequentist probability is often termed "objective" 
probability, one may call the present procedure objective Bayesian statistics. However, we 
return to the word "Bayesian" implying for the rest of the paper the qualification "objective" . 
By the following example, we show that this concept can be extended to models that lack 
the symmetry (j^). 



EXAMPLE 

The measurement problems encountered in science, often do not have a symmetry @. 
However, Bayesian inference based on a multi-dimensional event can usually be broken down 
into a succession of Bayesian arguments on more elementary events until a starting point 
that possesses a symmetry has been found. To be definite let us consider the parity violating 
matrix elements measured by the TRIPLE collaboration in resonant p-wave scattering of 
polarized neutrons on heavy nuclei [pl]-|24"|. For the present purpose it is not necessary to 



understand the details of those experiments. It suffices to know that the parity violating 
matrix elements — the events — have the Gaussian probability 

g(x\VM 2 + e 2 )dx = —= d% exp (--—?- — -) (6) 

where e is an experimental error — supposed to be given - - and M is the root mean 
square parity violating matrix element. The latter one is the "hypothesis" to be determined. 
There is no symmetry (0) relating x and M. However, the probability of eq. (§) remains 
invariant under simultaneous change of the scale of x and £ = \/ M 2 + e 2 . Hence, Bayesian 
inference should be done with respect to £ rather than M. Afterwards — when the posterior 
distribution of £ has been constructed — one can derive statements about M. E.g. one can 
decide whether with sufficiently high confidence £ is larger than e and thus M > 0. The 
symmetry considerations require 



4 



MO = | (7) 



as prior, which is the Haar measure of the Lie group of scale changes. 



The problem of plH24j1 is complicated by the fact, that one usually does not know the 
total angular momentum of the p-wave resonance. Only y>\/2 resonances can show parity 
violation. If the event x is gathered in a P3/2 resonance nothing can be learned about M 
and the distribution of the event is g(x\e). One knows, however, the probability q p of the 
occurrence of pi/2 resonances. Again it is not necessary to discuss here the contents of 
nuclear physics that create this complication. It suffices to state, that the event x follows 
with probability q p the distribution g(x\£) and with probability 1 — q p the distribution g(x\e), 
so that one has 

pOIO = q P g(x\£) + (1 - q p )g{x\e). 

The presence of the second term on the right hand side precludes any symmetry (|3|) relating 
x and £ — except in the limit of e — > when g(x\e) — > 5(x) and p(x\£) recovers the symmetry 
under scale changes. The full experiment of f21^p4| probes n resonances i — 1, . . . , n under 
varying experimental conditions, so that e = £j becomes a function of the resonance i. This 
alone precludes any symmetry of the multidimensional problem. 

If, however, one knows for one of the resonances that it is a pi/2 case, say for % — 1, 
then the event measured there has the distribution g(xi\£); the symmetry is scale-invariance 
and the prior is (0); one can use the event X\ to construct the Bayesian inverse P\{£\x\)\ it 
can be injected as prior distribution into the analysis of the results at % = 2, . . . , n. This 
procedure leads to a posterior distribution P(£\xi, . . . , x n ). In this way, one finds a starting 
point of the whole analysis which has a well defined symmetry and therefore a well defined 
prior. Let us call the whole procedure an approximate Bayesian (AB) one. 

It is not our purpose to reanalyze the data of refs. (2l| |2l| . We only want to demonstrate, 
that the AB analysis satisfies the criterion described above to a good approximation. For the 
purpose of this, the AB analysis has been subjected to a Monte Carlo test with parameters 
close to those of the experimental cases plLp^j . The number of resonances was chosen to be 
n = 15. The errors £j, % = 1, . . . , 15 have been drawn from an exponential distribution with 
mean value rM. This allows one to study (on fig. [TJ) the result as a function of r = e/M - 
the mean error relative to the true value M of the r.m.s. parity violating matrix element. In 
the experiments [PT]-|23, r ranged from 0.23 up to the order of unity. The coordinates X{ of 
the event x = (xi, . . . , X15) were generated in two steps. First one decides with probability 
q P — 3, whether the quantity Xi should belong to the pi/ 2 -wave resonances. If yes then Xi was 



drawn from an ensemble with distribution g(xi\y M 2 + e?), else the distribution g(xi\ei) was 
applied. The vector x — without the information from which of the two ensembles anyone 
of the Xi comes — is equivalent to the experimental "event" . 

The event x was analyzed by assuming that the "resonance" i m , where the maximum of 
the ratio Xi/Ei occurs, is a pi/2 resonance and can serve as the starting point of the analysis 
in the above sense. By help of the posterior distribution P(£|x), the shortest confidence 
interval (£<,£>) was found such that 

e> 

dr)P(r)\xi, . . . , x n ) = 0.68 
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and it was recorded whether the true value M was 
inside (max(0, v£< — £?), max(0, \/£> — £?)). This procedure was applied to 10 4 vectors 
x. On fig. D, the symbols labeled "Bayes" give the relative frequency of "success", i.e the 
probability to find M in the above mentioned range performed for different r which controls 
the experimental error. In this way, the curve on the figure was generated. (The full and 
dashed lines are 4th order polynomials fit to the points) . Because of the lack of symmetry, 
the result is not identical but only close to 0.68 (the dotted line). However, in the limit of 
r — > which means £j — ► for all i, the criterion is obeyed exactly — as it should be, since 
scale invariance is recovered in this limit. 
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FIG. 1. Bayesian inference is superior to the likelihood method: The frequency of the true 
value of the parity violating matrix element to lie in the 68%-confidence interval is plotted against 
the size of the experimental error. Details are explained in the text. 

In comparison, a "likelihood" analysis was performed. This type of analysis is very 
popular. It amounts to the Bayesian procedure with the prior distribution set constant. 
In the present case, we had to cut off the posterior distribution for M > M in order to 
normalize it very much as in f21"|-f23f . The figure shows, that the likelihood method is inferior 
to even the approximate Bayesian method. For good data, i.e. r — > 0, the likelihood method 
yields confidence intervals that are — in the light of the criterion — too wide. For data of 
marginal quality (r ~ 1), it yields no reliable confidence intervals, because the frequency of 
successes is considerably lower than the prescribed confidence. 



SUMMARY 



We have defined objective Bayesian statistics by supplementing the Bayesian argument 
with the requirement that it should yield "objectively correct" confidence intervals. A 
theorem by Stein — which can be found in the published mathematical literature [TJ[ but 
which seems to be unknown in practice — shows that this requirement can be met provided 
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that the conditional probability p(x\£)dx posses the symmetry (|3|) defined by a Lie group. 
By way of an example we have extended objective Bayesian statistics to the common case 
in which p{x\^)dx does not have an exact symmetry. In the example, a complex event 
x = (xi, . . . ,x n ) is broken down into elementary ones Xi among which there is at least 
one — say x\ — whose conditional probability p(xi\C,)dx possesses a symmetry ([3]). It is 
used to define the prior //(£). This has been termed approximate Bayesian procedure. We 
have shown numerically that the AB procedure is superior to the popular likelihood method 
as judged by the objectivity of the deduced confidence intervals. 

Note that the arguments presented here amount to a reconciliation of the subjective and 
frequentist interpretations of probability. The Bayesian argument attributes a distribution 
to an object, i.e. the hypothesis, which is given by Nature once and for all. This is justified 
by interpreting probability distributions as a representation of subjective knowledge on that 
object. The frequentist interpretation insists that a probability distribution must be verifi- 
able — at least in a gedanken experiment — as a frequency distribution that occurs in some 
stochastic process. We have described a gedanken experiment to generate a distribution of 
Bayesian confidence intervals from data that are conditioned by a fixed true value of the 
hypothesis. The rate of success, i.e. of the true value lying inside the confidence interval, 
turns out to be independent of the true value, moreover the rate of success is equal to the 
confidence prescribed in the Bayesian procedure — if the conditional distribution possesses 
the symmetry @ and if the prior is chosen to be the right Haar measure. Then the Bayesian 
inference is found reasonable from a frequentist's point of view. 
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